Data Viz Overview

Data visualization is an essential part of the statistical analysis process, both for exploratory analyses and summarizing findings.

Exploratory Data Analysis
  • Used for data quality checks

  • Help explore and understand the data

  • Typically, not seen by anyone else

Polished Data Visualization
  • Used to summarize data in presentations or papers

  • Should stand alone with appropriate titles, axes, labels, and captions

Spatial Data Viz Tools

There are many tools for creating spatial figures (GIS software, Tableau, etc…), but we will exclusively use R and the wide range of packages within it.

In particular, we will use:

  • ggplot2

  • ggmap

  • leaflet

  • RgoogleMaps

  • and many others…

Point Data: What is this?

Point Data: How about now?

Point Data: Is this better?

ggmap Code Overview

 mykey <- read_file('./google_api.txt')
  register_google(key = mykey)
  myMap <- get_map(location = c(lon = - 74,lat = 40.75),
                 source = "google",
                 maptype = "roadmap", crop = FALSE,
                 zoom = 11, api_key = mykey)

  ggmap(myMap) + 
    geom_point(aes(x=Lon, y=Lat), alpha=.03, size=.5, data=uber) + 
    labs(title = 'Location of Uber pickups on May 1, 2014 for NYC Destinations', 
    caption = 'source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city') + 
    xlab('') + ylab('') +
    theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())

Principles for Point Data

  1. Include useful background for appropriate context: there are several approaches for acquiring maps in R. Sometimes streets may be more useful, but in other situation a terrain image might be more relevant.
  2. With a point patterns, use transparency or heat map summaries to distinguish between areas of higher and lower intensity.
  3. Include useful titles, labels, and where appropriate, captions (all figures). These figures should stand alone.
  4. Sources should be cited in figures.

Cloning a Repo

Luckily, for all of us, this course will not consist of me talking for 75 minutes. Rather, active learning components will be interspersed within the day. In general, we will spend some time talking about a topic and then I will give you time to work through data visualization or analysis.

To facilitate the active learning sessions, I'd recommend cloning the repo at the start of class. For instance, here is the repo for today https://github.com/Stat534/Lecture3. Using R Studio to create a local project is one way to do this, but you won't have push access to my repo.

Active Learning Exercise: Seattle Police Calls Data Viz

seattle <- read_csv('./SeattlePolice.csv')
## Parsed with column specification:
## cols(
##   CAD.Event.Number = col_double(),
##   Event.Clearance.Description = col_character(),
##   Event.Clearance.SubGroup = col_character(),
##   Event.Clearance.Group = col_character(),
##   Census.Tract = col_double(),
##   Longitude = col_double(),
##   Latitude = col_double(),
##   Year = col_integer(),
##   Month = col_integer(),
##   Day = col_integer()
## )

Cartography

Distance Calculations

A collaborator suggests that there may a spatial relationship between the police calls in the Seattle Data Set. How would you calculate the distance between those points?

Event.Clearance.Description Longitude Latitude
TRAFFIC (MOVING) VIOLATION -122.3392 47.61372
SHOPLIFT -122.3384 47.61824
DISTURBANCE, OTHER -122.3474 47.61271

Distance Calculations, follow up

A collaborator suggests that there may a spatial relationship between the police calls in the Seattle Data Set. How would you calculate the distance between those points?

Event.Clearance.Description Longitude Latitude
TRAFFIC (MOVING) VIOLATION -122.3392 47.61372
SHOPLIFT -122.3384 47.61824
DISTURBANCE, OTHER -122.3474 47.61271

As a follow up:

  • what are the units you have calculated?
  • are they consistent across latitude?

Map Projections

Map Projections

Map projections are a representation of a surface on a plane. Specifically, functions are designed to map the geographical coordinate system \((\lambda, \phi)\) to an approach such that:

\[x = f(\lambda, \phi), \; \; y = g(\lambda, \phi),\]

where

UTM projections

Distances on the earth's surface

Distance Metrics

  • Euclidean
  • geodesic

Spatial Data in R

Vector Data

sf package

Additional Vector Geometries

Raster Data

raster package

Map Making with tmap

Additional References